Search Result

Select

De novo peptide sequencing by tandem mass spectrometry based on graph convolutional neural network

MOU Changning, WANG Haipeng, ZHOU Piyu, HOU Xinhang

Journal of Computer Applications 2021, 41 (9): 2773-2779. DOI: 10.11772/j.issn.1001-9081.2020111875

Abstract （399）

PDF （11373KB）（335）

Save

In proteomics, de novo sequencing is one of the most important methods for peptide sequencing by tandem mass spectrometry. It has the advantage of being independent on any protein databases and plays a key role in the determination of protein sequences of unknown species, monoclonal antibodies sequencing and other fields. However, due to its complexity, the accuracy of de novo sequencing is much lower than that of the database search methods, therefore the wide application of de novo sequencing is limited. Focused on the issue of low accuracy of de novo sequencing, denovo-GCN, a de novo sequencing method based on Graph Convolutional neural Network (GCN) was proposed. In this method, the relationships between peaks in mass spectrometry were expressed by using graph structure, and the peak features were extracted from each corresponding peptide cleavage site. Then the amino acid type at the current cleavage site was predicted by GCN, and finally a complete sequence was formed step by step. Three significant parameters affecting the model were experimentally determined, including the GCN model layer number, the combination of ion types and the number of spectral peaks used for sequencing, and datasets of a wide variety of species were used for experimental comparison. Experimental results show that, the peptide-level recall of denovo-GCN is 4.0 percentage points to 21.1 percentage points higher than those of the graph theory-based methods Novor and pNovo, and is 2.1 percentage points to 10.7 percentage points higher than that of DeepNovo, which adopts Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) network.

Reference | Related Articles | Metrics

Select

Peptide spectrum match scoring algorithm based on multi-head attention mechanism and residual neural network

MIN Xin, WANG Haipeng, MOU Changning

Journal of Computer Applications 2020, 40 (6): 1830-1836. DOI: 10.11772/j.issn.1001-9081.2019101880

Abstract （394）

PDF （1141KB）（401）

Save

Peptide spectrum match scoring algorithm plays a key role in the peptide sequence identification, and the traditional scoring algorithm cannot effectively make full use of the peptide fragmentation pattern to perform scoring. In order to solve the problem, a multi-classification probability sum scoring algorithm combined with the peptide sequence information representation called deepscore-α was proposed. In this algorithm, the second scoring was not performed with the consideration of global information, and there was no limitation on the similarity calculation method of theoretical mass spectrum and experimental mass spectrum. In the algorithm, a one-dimensional residual network was used to extract the underlying information of the sequence, and then the effects of different peptide bonds on the current peptide bond fracture were integrated through the multi-attention mechanism to generate the final fragmention relative intensity distribution probability matrix, after that, the final peptide spectrum match score was calculated by combining the actual relative intensity of the peptide sequence fragmention. This algorithm was compared with Comet and MSGF+, two common open source identification tools. The results show that when False Discovery Rate （FDR） was 0.01 on humanbody proteome dataset, the number of peptide sequences retained by deepScore-α is increased by about 14%, and the Top1 hit ratio (the proportion of the correct peptide sequences in the spectrum with the highest score) of this algorithm is increased by about 5 percentage points. The generalization performance test of the model trained by human ProteomeTools2 dataset show that the number of sequences peptide retained by deepScore-α at FDR of 0.01 is improved by about 7%, the Top1 hit ratio of this algorithm is increased by about 5 percentage points, and the identification results from Decoy library in the Top1 is decreased by about 60%. Experimental results prove that, the algorithm can retain more peptide sequences at lower FDR value, improve the Top1 hit ratio, and has good generalization performance.

Reference | Related Articles | Metrics

Select

Newton-soft threshold iteration algorithm for robust principal component analysis

WANG Haipeng, JIANG Ailian, LI Pengxiang

Journal of Computer Applications 2020, 40 (11): 3133-3138. DOI: 10.11772/j.issn.1001-9081.2020030375

Abstract （316）

PDF （3222KB）（486）

Save

Aiming at Robust Principal Component Analysis (RPCA) problem, a Newton-Soft Threshold Iteration (NSTI) algorithm was proposed for reducing time complexity of RPCA algorithm. Firstly, the NSTI algorithm model was constructed by using the sum of the Frobenius norm of the low-rank matrix and the l ₁-norm of the sparse matrix. Secondly, two different optimization methods were used to calculate different parts of the model at the same time. Newton method was used to quickly calculate the low-rank matrix. Soft threshold iteration algorithm was used to quickly calculate the sparse matrix. The decomposition of low-rank matrix and sparse matrix of original data was calculated by alternately using the two optimization methods. Finally, the low-rank features of the original data were obtained. Under the condition that the data scale is 5 000×5 000 and rank of the low-rank matrix is 20, NSTI algorithm can improve the time efficiency by 24.6% and 45.5% compared with Gradient Descent (GD) algorithm and Low-Rank Matrix Fitting (LMaFit) algorithm. For foreground and background separation of 180-frame video, NSTI takes 3.63 s and has the time efficiency 78.7% and 82.1% higher than GD algorithm and LMaFit algorithm. In the experiment of image denoising, NSTI algorithm takes 0.244 s, and the residual error of the image processed by NSTI and the original image is 0.381 3, showing that the time efficiency and the accuracy of the proposed algorithm are 64.3% more efficient and 45.3% more accurate than those of GD algorithm and LMaFit algorithm. Experimental results prove that NSTI algorithm can effectively solve the RPCA problem and improve the time efficiency of the RPCA algorithm.

Reference | Related Articles | Metrics

Select

Real-time face pose estimation system based on 3D face model on Android mobile platform

WANG Haipeng, WANG Zhengliang, XU Weiwei, FAN Ran

Journal of Computer Applications 2015, 35 (8): 2321-2326. DOI: 10.11772/j.issn.1001-9081.2015.08.2321

Abstract （923）

PDF （926KB）（462）

Save

Concerning that the high performance requirement of face pose estimation system which could not run on mobile phone in real time, a real-time face pose estimation system was realized for Android mobile phone terminals. First of all, one positive face image and one face image with a certain offset angle were obtained by the camera for establishing a simple 3D face model by Structure from Motion (SfM) algorithm. Secondly, the system extracted corresponding feature points from the real-time face image to 3D face model. The 3D face pose parameters were got by POSIT (Pose from Orthography and Scaling with ITeration) algorithm. At last, the 3D face model was displayed on Android mobile terminals in real-time using OpenGL (Open Graphics Library). The experimental results showed that the speed of detecting and displaying the face pose was up to 20 frame/s in the real-time video, which is close to 3D face pose estimation algorithm based on the affine correspondance on computer terminals; and the speed of detecting a large number of image sequences reached 50 frame/s. The results indicate that the system can satisfy the performance requirement for Android mobile phone terminals and real-time requirement of detecting the face pose.

Reference | Related Articles | Metrics

Select

Fatigue behavior detection by mining keyboard and mouse events

WANG Tianben WANG Haipeng ZHOUXingshe NI Hongbo LIN Qiang

Journal of Computer Applications 2014, 34 (1): 227-231. DOI: 10.11772/j.issn.1001-9081.2014.01.0227

Abstract （478）

PDF （747KB）（385）

Save

Long-term continuous use of computers would bring negative effects on users' health. In order to detect users fatigue level in a non-invasive manner, an approach that is able to measure fatigue level on hand muscle based on the keyboard and mouse events was proposed. The proposed method integrated keying action match, data noise filtering, and feature vector extraction/classification together to collect and analyze the delay characteristics of both keying and hitting actions, upon which the detection of fatigue level on hand muscle could be enabled. With the detected fatigue level, friends belonging to the same virtual community on current social networks could be, in real-time, alerted and persuaded to take a health-conscious way in their daily use of computers. Particularly, an interesting conclusion has been made that there is an obvious negative correlation between keying (hitting) delay and fatigue level of hand muscle. The experimental validation conducted on two-week data collected from 15 participants shows that the proposed method is effective in detecting users fatigue level and distributing fatigue-related health information on social network platform.